AITopics | Educational Standards

Collaborating Authors

Educational Standards

MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations

Lazzaroni, Ruggero Marino, Angioi, Alessandro, Puliga, Michelangelo, Sanna, Davide, Marras, Roberto

arXiv.org Artificial IntelligenceSep-10-2025

Large language models (LLMs) show increasing potential in education, yet benchmarks for non-English languages in specialized domains remain scarce. We introduce MedBench-IT, the first comprehensive benchmark for evaluating LLMs on Italian medical university entrance examinations. Sourced from Edizioni Simone, a leading preparatory materials publisher, MedBench-IT comprises 17,410 expert-written multiple-choice questions across six subjects (Biology, Chemistry, Logic, General Culture, Mathematics, Physics) and three difficulty levels. We evaluated diverse models including proprietary LLMs (GPT-4o, Claude series) and resource-efficient open-source alternatives (<30B parameters) focusing on practical deployability. Beyond accuracy, we conducted rigorous reproducibility tests (88.86% response consistency, varying by subject), ordering bias analysis (minimal impact), and reasoning prompt evaluation. We also examined correlations between question readability and model performance, finding a statistically significant but small inverse relationship. MedBench-IT provides a crucial resource for Italian NLP community, EdTech developers, and practitioners, offering insights into current capabilities and standardized evaluation methodology for this critical domain.

large language model, machine learning, medbench-it, (19 more...)

arXiv.org Artificial Intelligence

2509.07135

Country:

Europe > Italy (0.47)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Education > Assessment & Standards > Educational Standards (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Multimodal Generative AI with Korean Educational Standards

Park, Sanghee, Kim, Geewook

arXiv.org Artificial IntelligenceFeb-21-2025

This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.

benchmark, error rate, zhang, (14 more...)

arXiv.org Artificial Intelligence

2502.15422

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > South Korea (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.46)
Education > Assessment & Standards > Educational Standards (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Add feedback

Digitizing educational standards to make learning materials reusable across countries

#artificialintelligenceOct-15-2019, 21:17:05 GMT

Consider a refugee population coming from country C residing in host country B, with limited or no access to education. The trauma of conflict and displacement, coupled with the difficulty of integration within the host country puts refugee populations at a significant educational disadvantage, so it is worthwhile considering options that could "level the playing field" by providing improved access to education. There is hope that the vast amounts of Open Educational Resources (OER) that are freely available on the internet can play a role in this, in particular in combination with educational platforms like Kolibri. The Kolibri platform aims to provide access to learning opportunities for all and it is particularly suited for the refugee context as the runs-anywhere capabilities of the Kolibri applications allow it to be accessed in computer labs, in the classroom, from phones, and in informal learning centres. Our experience and work with partners like UNHCR have shown that in emergency and crisis contexts, a key bottleneck is the lack of sufficient educational content aligned to the learning goals of the project.

curriculum standard, objective, standard entry, (16 more...)

#artificialintelligence

Country:

Africa > Kenya (0.06)
North America > United States (0.04)

Genre: Instructional Material (1.00)

Industry:

Education > Educational Setting (0.88)
Government > Regional Government (0.70)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.55)
(2 more...)

Technology: Information Technology > Artificial Intelligence (0.92)

Add feedback